Differentiation‐related epigenomic changes define clinically distinct keratinocyte cancer subclasses

Abstract Keratinocyte cancers (KC) are the most prevalent malignancies in fair‐skinned populations, posing a significant medical and economic burden to health systems. KC originate in the epidermis and mainly comprise basal cell carcinoma (BCC) and cutaneous squamous cell carcinoma (cSCC). Here, we combined single‐cell multi‐omics, transcriptomics, and methylomics to investigate the epigenomic dynamics during epidermal differentiation. We identified ~3,800 differentially accessible regions between undifferentiated and differentiated keratinocytes, corresponding to regulatory regions associated with key transcription factors. DNA methylation at these regions defined AK/cSCC subtypes with epidermal stem cell‐ or keratinocyte‐like features. Using cell‐type deconvolution tools and integration of bulk and single‐cell methylomes, we demonstrate that these subclasses are consistent with distinct cells‐of‐origin. Further characterization of the phenotypic traits of the subclasses and the study of additional unstratified KC entities uncovered distinct clinical features for the subclasses, linking invasive and metastatic KC cases with undifferentiated cells‐of‐origin. Our study provides a thorough characterization of the epigenomic dynamics underlying human keratinocyte differentiation and uncovers novel links between KC cells‐of‐origin and their prognosis.


Introduction
The epidermis constitutes the first line of defense of the human body against environmental damage. This stratified squamous epithelium is mainly composed of keratinocytes, which arise from epidermal stem cells (EpSCs) located at the basal layer of the epidermis (Gonzales & Fuchs, 2017;Moreci & Lechler, 2020). As EpSCs start differentiating, they detach from the basement membrane and migrate upwards, resulting in distinct differentiated keratinocyte populations (i.e., spinous, granular, and cornified) (Gonzales & Fuchs, 2017;Moreci & Lechler, 2020). Terminally differentiated keratinocytes are continuously desquamated. As such, the homeostatic epidermis is subjected to a constant turnover, which is regulated by a fine-tuned balance between self-renewal and differentiation (Blanpain & Fuchs, 2009).
Keratinocyte cancers (KC), also known as non-melanoma skin cancers (NMSC), originate from epidermal keratinocytes. They represent the most common malignancies worldwide in the fairskinned population, with an incidence 20 times higher than that of melanoma, the other major skin cancer (Apalla et al, 2017b;Fitzmaurice et al, 2019;Stang et al, 2019). The incidence of KC has alarmingly risen over the last decade, with an increase of~33% in the total number of cases worldwide between 2007 and 2017 (Fitzmaurice et al, 2019). These numbers illustrate why, despite a lower mortality rate, KC are associated with significant morbidity and a heavy burden on public health systems (Mudigonda et al, 2010;Apalla et al, 2017b;Fitzmaurice et al, 2019). Two distinct malignancies account for 99% of all KC: basal cell carcinoma (BCC) and cutaneous squamous cell carcinoma (cSCC) (Apalla et al, 2017a;Barto s & Kullov a, 2018). Even though cSCC represents only 20% of KC cases, it accounts for the vast majority of deaths associated with such malignancies, as about 5% of the tumors metastasize, with a mortality rate exceeding 70% (Ratushny et al, 2012;Burton et al, 2016). In contrast, the estimated metastatic potential of BCC is less than 0.05% (Apalla et al, 2017a). Most invasive cSCCs arise either from a precancerous dysplasia known as actinic keratosis (AK) or from an in situ carcinoma known as Bowen's disease (BD), with a progression rate of 0.025-16% and 3-5% per year and event, respectively (Ratushny et al, 2012;Burton et al, 2016). However, the molecular mechanisms underlying their progression to invasive cSCC remain largely unknown.
DNA methylation is a dynamic epigenetic modification that mainly occurs in the context of CpG dinucleotides at the carbon-5 position of cytosines (Lyko, 2018). Catalyzed by a set of three methyltransferases (DNMT1, DNMT3A, and DNMT3B), it has a strong influence on gene expression and other essential genetic functions (Lyko, 2018). Consequently, DNA methylation is essential for the establishment and maintenance of cellular identity (Lyko, 2018;Greenberg & Bourc'his, 2019). Disruption of normal DNA methylation patterns is currently considered a hallmark of cancer, which presents a characteristic genome-wide hypomethylation and regional hypermethylation (Jones & Baylin, 2007). Importantly, tumor methylomes not only include cancer-specific methylation changes, but also partially maintain the DNA methylation patterns of their tumor-initiating cell (Kulis et al, 2013;Moran et al, 2016). In fact, a systematic study concluded that cell-of-origin-related patterns are the main variable influencing tumor stratification in many tumor entities (Hoadley et al, 2018).
Epidermal differentiation has been associated with dynamic changes in DNA methylation. In mice, keratinocyte differentiation was associated with a general loss of DNA methylation at lineagespecific regulatory elements, while methylation gains occurred at regulatory regions of other lineages (Bock et al, 2012). Similarly, a loss of DNA methylation in the promoter region of roughly 60% of the genes induced upon keratinocyte differentiation in vitro has been observed in humans (Sen et al, 2010). In agreement with these findings, we have previously identified two subclasses of AK and cSCC based on their methylation patterns and that we interpreted to arise from keratinocytes at two distinct epidermal differentiation stages: one more closely related to the EpSCs and one to a more differentiated keratinocyte (Rodr ıguez-Paredes et al, 2018a). However, direct proof for this interpretation has been lacking and the subclasses were not characterized in detail.
Here, we performed an integrated analysis of the chromatin dynamics associated with human epidermal differentiation using single-cell multi-omics and transcriptomics approaches. We identified more than 3,800 differentially accessible regions between undifferentiated and terminally differentiated keratinocytes. Further characterization of these regions revealed that they comprised regulatory regions associated with known but also novel epidermal differentiation transcription factors. Tumor stratification based on the DNA methylation patterns found at these differentially accessible regions identified two subtypes of AK and cSCC with EpSC-like and keratinocyte-like features. Importantly, we also show for the first time DNA methylation dynamics in the human epidermis at singlecell resolution, which we studied with single-cell combinatorial indexing for methylation analysis (sci-MET) (Mulqueen et al, 2018), after addressing important shortcomings of the original protocol. The integrative analysis of bulk and single-cell methylation datasets, as well as the use of deconvolution tools based on scRNA-seq, provided direct evidence of the cell-of-origin interpretation of the AK/ cSCC subtypes. Furthermore, epigenomic data analyses using a mitotic-like clock and the stratification of an expanded dataset, which included BCC and other yet unstratified epidermal entities, suggested a more invasive phenotype and higher metastatic potential for tumors arising from undifferentiated keratinocytes. All in all, our DNA methylation-based tumor stratification strategy may represent an important advance in the risk assessment of KC patients.

Single-cell multi-omics analysis of healthy human epidermis
To investigate differentiation-related epigenomic changes in the human epidermis at the single-cell level, we used a combination of single-cell multi-omic and transcriptomic approaches. First of all, we generated a single-cell multi-omics (scATAC-seq + scRNA-seq) dataset from two sun-protected healthy epidermis samples (55 and 72 y/o, male). A total of 5,565 cells passed the quality control for both genomic layers and were integrated into a common dataset to avoid batch effects ( Fig EV1A). Unsupervised clustering identified 10 cell clusters, which were visualized using a joint uniform manifold approximation and projection (UMAP) representing both gene expression and chromatin accessibility ( Fig EV1B). To identify the cell identity of each cluster, we also generated a reference scRNAseq dataset by combining our own data generated from a sunprotected healthy epidermis sample from a 30 y/o male donor (Appendix Fig S1A and B), with a matching subset of sun-protected healthy epidermis from three donors (Cheng et al, 2018a;Data ref: Cheng et al, 2018b). All four samples were obtained from the trunk area and did not display significant differences. The integrated dataset contained 32,272 high-quality cells and their unsupervised clustering defined 13 cell clusters, which comprised cells from all donors (Appendix Fig S1C and D). These included six archetypical keratinocyte populations: two basal undifferentiated populations, two mitotic clusters, and the well-differentiated spinous and granular keratinocytes (Fig EV2A and B,and Dataset EV1). Highly specialized keratinocyte populations such as channel or proinflammatory keratinocytes were also detected ( Fig EV2A and B, and Dataset EV1). Lineage inference using RNA velocity analysis was possible with our own dataset and placed the Basal 1 population at the beginning of the differentiation process ( Fig EV2C and Appendix Fig S2). The trajectory then progressed to the mitotic keratinocytes and, lastly, to the well-differentiated spinous population ( Fig EV2C). Hence, these results suggest that the main EpSC population is contained in the Basal 1 cluster.
After cell annotation based on the reference scRNA-seq dataset, most keratinocyte populations identified in the scRNA-seq experiment were also detected in the multi-omics dataset ( Fig 1A). Of note, our multi-omics analysis showed the expected chromatin accessibility and gene expression dynamics for several established epidermal differentiation markers. For instance, ATAC peaks associated with either the basal keratinocyte gene marker KRT5 or the suprabasal differentiated keratinocyte gene marker KRT10, lost or gained accessibility as they became less or more expressed along the differentiation trajectory, respectively ( Fig 1B). Consistently, we observed an increase in co-accessibility in the epidermal differentiation complex (EDC), a genomic region containing multiple genes related to terminal differentiation and cornification (Kypriotou et al, 2012), in terminally differentiated keratinocytes (Spinous and Granular, cluster 3) compared with basal undifferentiated keratinocytes (Basal 1, cluster 2, Fig 1C).
To further characterize differentiation-related changes occurring at the chromatin level, we compared the genome accessibility in basal undifferentiated keratinocytes (Basal 1, cluster 2) and in terminally differentiated keratinocytes (Spinous and Granular, cluster 3). This comparison identified 3,838 differentially accessible peaks, of which 1,659 were only accessible in undifferentiated keratinocytes and 2,179 were only accessible in differentiated keratinocytes ( Fig 1D, Dataset EV2). Motif enrichment analysis for each set of accessible peaks identified cell-type-specific overrepresentation of transcription factor (TF) binding motifs (Fig 1D). For example, TFbinding motifs associated with key regulators of epidermal stem cell proliferation and differentiation such as TP63 (Soares & Zhou, 2018) and OVOL1 (Lee et al, 2014) were enriched in peaks that were    specific to undifferentiated keratinocytes ( Fig 1D). In contrast, TFbinding motifs from members of the CEBP family, which are associated with terminal differentiation in keratinocytes (Borrelli et al, 2007;Lopez et al, 2009), were enriched in peaks specific to differentiated keratinocytes ( Fig 1D). To refine our multimodal analysis, we then combined motif activity scores calculated using chromVAR and gene expression data in order to identify the transcription factors with specifically enriched expression and motif accessibility in each cell cluster. This identified key regulators of the basal undifferentiated keratinocytes, including MEF2A, TEAD1, IRF1, TP63, and NFKB and key transcription factors of terminally differentiated keratinocytes, including GRHL1, RORA, CEBPA, NR1D1, or SREBF2 ( Fig 1E and F). While most of these transcription factors have been previously found to play important roles in epidermal differentiation (Truong et al, 2006;Dai et al, 2013;Gulati et al, 2013;Mlacki et al, 2014;Yuan et al, 2020), MEF2A has not yet been associated with this process. Gene regulatory networks analysis using single-cell regulatory network inference and clustering (SCENIC) (Aibar et al, 2017) on the transcriptomics data of our multi-omics dataset also identified MEF2A and TEAD1 as key transcription factors for undifferentiated keratinocytes, and SREBF2, CEBPA, GRHL1, and NR1D1 for differentiated keratinocytes ( Fig 1G). Altogether, our multiomics data recapitulated known accessibility and gene expression dynamics during epidermal differentiation and identified potential new key regulators, such as MEF2A.

DNA methylation at differentially accessible regions defines AK/ cSCC subtypes
To investigate whether the differentially accessible peaks detected during epidermal differentiation corresponded to regulatory regions, such as gene promoters or enhancers, we made use of published ChIP-seq data for several histone marks generated on normal human epidermal keratinocytes (NHEK). Accessible peaks from basal and differentiated keratinocytes showed no enrichment for the repressive chromatin mark H3K27me3, in agreement with their open state (Fig 2A). On the contrary, differentiated keratinocytespecific peaks showed a strong correlation with H3K27ac and H3K4me1, two histone marks that are associated with active enhancers (Creyghton et al, 2010) (Fig 2A). Furthermore, undifferentiated keratinocyte-specific peaks were enriched for H3K27ac and H3K4me2/me3, histone marks that are associated with gene promoters and actively transcribed regions (Bernstein et al, 2005; Orford et al, 2008) (Fig 2A). Altogether, our analyses indicate that the differentially accessible regions identified in our scATAC-seq data correspond to regulatory regions associated with key regulators of epidermal differentiation. DNA methylation cooperates with chromatin accessibility to establish and maintain cellular identity (Guo et al, 2016;Li et al, 2021). Furthermore, DNA methylation patterns at regulatory regions have been used to define the cellular origin of several human cancer types (Kulis et al, 2013;Moran et al, 2016;Hoadley et al, 2018). To assess whether the methylation patterns at the differentially accessible regions between undifferentiated and differentiated keratinocytes would be informative for identifying the cellular origin of epidermal tumors, we extracted the CpGs located in the 3,838 differentially accessible peaks and that can be interrogated with probes on the Infinium EPIC array. This identified 2,925 CpG probes located in undifferentiated keratinocyte-specific peaks and 1,426 CpG probes located in differentiated keratinocyte-specific peaks. These probes covered 914 and 864 peaks accessible exclusively in undifferentiated or differentiated keratinocytes, respectively. In agreement with the histone modifications landscape of each set of accessible regions, the CpGs located in undifferentiated keratinocyte-specific peaks were mostly located in promoter-associated CpG islands while the CpGs located in differentiated keratinocyte-specific peaks were mostly located in OpenSea regions, which are often associated with enhancers ( Fig 2B). We then combined 21 newly generated AK and cSCC methylomes with a published dataset comprising healthy, AK and cSCC epidermis samples (Rodr ıguez-Paredes et al, 2018a; Data ref: Rodr ıguez-Paredes et al, 2018b). Unsupervised clustering of all 12 healthy, 20 AK and 35 cSCC epidermal samples based on the methylation patterns of either the 914 undifferentiated keratinocyte-specific peaks or the 864 differentiated keratinocytespecific peaks clearly stratified the AK and cSCC methylomes into two groups, one with EpSC-like features and another one with keratinocyte-like features ( Fig 2C). The two subclasses were also clearly separated in a Principal Component Analysis (PCA) based on all CpG probes ( Fig 2D). Importantly, this separation was not related to differences in sample purity, as both subclasses showed a very high degree of tumor cell purity (Appendix Fig S3).

Single-cell methylation analysis of keratinocyte differentiation
To further refine the cells-of-origin of AK/cSCC, we combined these bulk methylation datasets with single-cell methylation data from ◀ Figure 1. Single-cell multi-omics characterization of human epidermal differentiation.
A Joint UMAP plot depicting both scATAC-seq and scRNA-seq data from 5,355 keratinocytes from sun-protected human epidermis (n = 2). Color depicts the unsupervised clustering (left) as well as cell-type annotation based on the reference scRNA-seq dataset (right). B Representative examples of chromatin accessibility and gene expression of KRT5 and KRT10, two key epidermal differentiation-related genes, in the main keratinocyte populations. C Co-accessibility at the human epidermal differentiation complex (EDC) in undifferentiated (cluster 2) and differentiated (cluster 3) keratinocytes. Only connections (arcs) with a co-accessibility score above 0.25 are plotted. Gray boxes below tracks represent scATAC-seq peaks. D Left: Heatmap displaying the differentially accessible peaks between undifferentiated (cluster 2) and differentiated (cluster 3) keratinocytes. Number of Tn5 insertion sites in each region was scaled by row. Right: DNA sequence motifs for the top six overrepresented transcription factor (TF) motifs in undifferentiated (cluster 2) and differentiated (cluster 3) keratinocyte-specific accessible peaks. E Heatmaps displaying the predicted top five transcription factors in each cell cluster using chromVAR motif activity (left) and gene expression (right). F chromVAR deviations (in quantiles) and gene expression for representative enriched TF projected onto the joint UMAP plot of keratinocytes from the single-cell multiomics dataset. G Heatmap showing the top five active regulons in each cell cluster, based on the expression data of the multi-omics dataset. TF regulons also identified by other approaches are highlighted.
human epidermal cells. Thus, we performed single-cell combinatorial indexing for methylation analysis (sci-MET) (Mulqueen et al, 2018) from a sun-protected healthy epidermis sample that was obtained from a 62 y/o male donor. In order to obtain a dataset with a higher sequencing coverage, we generated three sci-MET libraries containing only around 200 epidermal cells each. This resulted in the detection of 554 cells after read alignment and single-cell demultiplexing, with an average CpG coverage per cell of 0.85% (0.14-6.88%), in agreement with published data (Mulqueen et al, 2018). Single cells showed a detectable heterogeneity and subclustering as well as differences in their methylation content, suggesting methylation changes within the population (Appendix Fig S4). We then performed a multidimensional scaling (MDS) analysis with the 554 single-cell methylomes and the 55 AK/cSCC and 12 healthy epidermis EPIC samples. Of note, 548 single-cell methylomes grouped closely with the healthy epidermis and keratinocyte-like tumors, while six cells clustered with the EpSC-like tumors ( Fig 3A). To assess whether these two cell clusters represented EpSC and differentiated keratinocytes, respectively, we examined their average methylation level at different genomic regions based on the NHEK ChromHMM segmentation (see Materials and Methods for details) (Ernst et al, 2011). Consistent with their keratinocyte identity, both cell clusters displayed low methylation levels in promoter regions that were designated as active in NHEKs and high methylation in regions that were designated as actively transcribed in NHEKs ( Fig 3B). Also, repressed, heterochromatic, and repetitive regions displayed high methylation levels in both cell groups, as expected (Deplus et al, 2014) (Fig 3B). In agreement with the loss of DNA methylation in lineage-specific regulatory elements upon epidermal differentiation seen in mice (Bock et al, 2012) and humans (Sen et al, 2010), enhancer regions, and especially strong enhancers, were found to be less methylated in cells clustering with keratinocyte-like tumors ( Fig 3B). These findings further support the notion that the two cell clusters represent EpSCs and differentiated keratinocytes, respectively. Importantly, the genomic regions of AK/cSCC showed very similar methylation patterns compared with the single-cell profiles, with keratinocyte-like tumors displaying substantial enhancer hypomethylation ( Fig 3C). Taken together, these results provide important confirmation for the cell-of-origin interpretation of the methylation-based AK/cSCC subclasses. A Average histone modification profiles of undifferentiated and differentiated keratinocyte-specific peaks using previously published data generated on NHEK cells (ENCODE). The normalized signal of H3K27ac, H3K4me1/me2/me3, and H3K27me3 were measured in a window of AE 10,000 base pairs (bp). B Fractions of CpGs located within epigenomic substructures for the 4,351 InfiniumEPIC CpG probes found within undifferentiated and differentiated keratinocytespecific peaks. C Unsupervised hierarchical clustering of 12 healthy, 20 AK, and 35 cSCC epidermal samples based on the methylation status at undifferentiated and differentiated keratinocyte-specific peaks. Each row represents the average methylation value of all CpGs contained in a particular peak. D Principal Component Analysis (PCA) of 67 AK/cSCC and healthy controls performed with all detected CpGs after filtering (n = 632,778). Coloring is according to sample type and shape is according to cell-of-origin-related subclass. While we only detected two epidermal differentiation stages in our single-cell methylation analysis, we observed a higher number using single-cell transcriptomics and chromatin accessibility analyses. To assess whether the DNA methylation patterns represent several transcriptomic states and to further explore the keratinocyte composition of AK/cSCC, we used computational deconvolution of cell-type fractions in the bulk DNA methylation datasets based on scRNA-seq data . After characterization of the reference scRNA-seq dataset (Fig EV2), we generated the reference expression and DNA methylation matrices for bulk methylome deconvolution (Fig 3D; see Materials and Methods). Cell fraction estimation revealed an overall similar keratinocyte composition in  TNFSF10  TMEM154  RAB25  LGALS7  AZGP1  HPGD  HAL  PI3  LCE3D  SSFA2  POF1B  BTC  XG  TM7SF2  IL1R2  PVRL4  TNNC1  GDPD3  PRSS3  USP2  GATB  SLPI  EMB  RRAD  C6orf132  ACPP  IFI30  IL16  C11orf71  SPTLC3  MYO1E  ITGAE  IGFBP5  SERPINH1  PHF14  DEFB1  CHP2  for (C) EpSC-like and keratinocyte-like AK/cSCC from the EPIC dataset. ChromHMM state 1: active promoter; state 2: weak promoter; state 3: inactive/poised promoter; state 4: strong enhancer; state 5: strong enhancer; state 6: weak/poised enhancer; state 7: weak/poised enhancer; state 8: insulator; state 9: transcriptional transition; state 10: transcriptional elongation; state 11: weak transcription; state 12: polycomb-repressed; state 13: heterochromatin/low signal; state 14: repetitive/copy number variation; state 15: repetitive/copy number variation. D Heatmaps displaying the gene expression (upper) and the imputed promoter DNA methylation (lower) reference matrices for keratinocyte populations involved in terminal differentiation, calculated by EpiSCORE. The weight of each gene in the reference DNA matrix is also depicted. E Boxplots displaying estimated cellular fractions of the keratinocyte populations involved in the differentiation trajectory in keratinocyte-like and EpSC-like AK/cSCC as well as in healthy epidermis.
healthy epidermis and keratinocyte-like tumors, with higher proportions of well-differentiated spinous and granular keratinocytes ( Fig 3E). In contrast, EpSC-like AK/cSCC showed enrichment for the EpSC-containing Basal 1 population ( Fig 3E). Collectively, these results indicate that the two methylation profiles reflect transcriptionally distinct epidermal differentiation stages. Furthermore, the methylation-based AK/cSCC subclasses display a differential enrichment for keratinocyte populations at the start and end of the lineage trajectory. These findings thus provide further confirmation for their EpSC-like and keratinocyte-like origin, respectively.

Methylation-based subclasses display distinct phenotypic features
To further characterize the two cell-of-origin-based subclasses identified in the bulk methylome analysis, we used mitotic clock algorithms (Teschendorff, 2020). We observed an increased mitotic age in both tumor types in comparison with healthy epidermis (Fig 4A). This increase was more pronounced in the keratinocyte-like subgroup, while the EpSC-like tumors showed a more moderate effect ( Fig 4A). Subsequent calculations of the intrinsic stem cell division rate (SCDR) estimated an SCDR of 10.35 divisions per stem cell and year in the healthy epidermis ( Fig 4B). This is similar to experimentally assessed division rates for human EpSC (Maeda, 2017). In agreement with the increased mitotic age, we also observed an increase in the SCDR in AK/cSCC, with a more pronounced effect in keratinocyte-like tumors (SCDR = 39.7; Fig 4B). On the contrary, the EpSC-like tumor subclass again showed a more moderate effect (SCDR = 25, Fig 4B). These results suggest different proliferation rates for the two cell-of-origin-based subclasses. Cellular proliferation and invasion are known to be two independent processes with a high degree of anticorrelation in cancer cells (Gao et al, 2005;Hoek et al, 2008;Hecht et al, 2015). Thus, we hypothesized that a less proliferative phenotype in EpSC-like tumors could indicate a higher invasiveness in these cases. Following this line of thought, we investigated whether the EpSC-like tumors displayed epigenetic features of an invasive phenotype by analyzing the methylation status of three miRNAs known to be silenced by promoter hypermethylation in cancer cells displaying epithelial-tomesenchymal transition (EMT) (Wiklund et al, 2011;Davalos et al, 2012). Indeed, promoter regions for the MIR200C/141 cluster and MIR205 were found to be highly methylated, specifically in the EpSC-like subclass ( Fig 4C). Furthermore, immunofluorescence staining of ZEB2, an EMT-driving transcription factor repressed by the miR200 family and miR205 (Gregory et al, 2008;Park et al, 2008), showed an increase in ZEB2-positive nuclei in EpSC-like cSCC samples (Fig 4D). These results are consistent with distinct proliferative and invasive characteristics for the two AK/cSCC cellof-origin-based subclasses.

Methylation-based subclasses display distinct clinical features
To assess whether the phenotypic differences observed between cell-of-origin-based subclasses also resulted in distinct clinical features, we analyzed the DNA methylation patterns of additional epidermal tumors with different metastatic potentials. Thus, we generated new EPIC datasets containing 11 in situ squamous cell carcinoma (Bowen's disease; BD), another type of pre-invasive lesion leading to invasive cSCC, 14 basal cell carcinoma (BCC), the most common KC in the general, immunocompetent population, and 10 non-cancerous senile warts (seborrheic keratosis, SK). These newly generated datasets were combined with our dataset containing healthy, AK and cSCC samples. Tumor stratification based on methylation patterns at the differentiated keratinocyte-specific accessible regions identified in our scATAC-seq uncovered the two previously described cell-of-origin-related subclasses, which could again be observed in a PCA performed with all CpG probes (Figs 5A and EV3A). Similar to AK and cSCC samples, BD lesions were also stratified into the two cell-of-origin-based subclasses ( Fig 5A). Moreover, further analyses indicated almost indistinguishable methylomes between precursor AK lesions, in situ carcinomas, and cSCC arising from the same cell type, but highlighted major epigenetic differences between cell-of-origin-based subclasses (Fig EV4). In contrast, rarely metastatic BCC and non-cancerous SK cases were almost exclusively classified as keratinocyte-like (Figs 5A and EV3A). Our analysis thus stratifies all the main keratinocyte cancer entities according to two main initiating cells-of-origin and suggests a bias toward lower metastatic potential for the more differentiated subclass.
In addition, we investigated a published dataset comprising five sclerodermiform BCC (sBCC) tumors (Sand et al, 2019a;Data ref: Sand et al, 2019b). sBCC is considered particularly aggressive as it presents high recurrence rates as well as higher local invasiveness (Sand et al, 2019a;Conforti et al, 2021), which predicted an enrichment of EpSC-like methylation patterns. Indeed, methylation patterns at the accessible regions identified three out of five samples as EpSC-like (Figs 5B and EV3B). When these datasets were integrated with our BCC datasets, which also included three sBCCs, only four out of 19 samples were classified as EpSC-like, three of which were from the sBCC subtype (Appendix Fig S5). These results indicate an enrichment for aggressive sBCC cases in the EpSC-like BCC subclass.
Furthermore, we analyzed a published EPIC dataset consisting of 12 common warts (verruca vulgaris; VV), another type of benign epidermal tumor linked to human papillomavirus infection (Al-Eitan et al, 2020). In accessible regions, methylation patterns of VV samples indicated a more differentiated cell-of-origin that was highly similar to those present in healthy epidermis (Figs 5C and EV3C). VV methylation patterns also appeared similar to those in SK, thus indicating a differentiated keratinocyte-like cell-of-origin for both senile (SK) and common (VV) warts.
Lastly, we stratified another published dataset, consisting of five AK and a collection of 18 invasive cSCC samples, ranging from initially invasive to metastatic (Herv as-Mar ın et al, 2019a; Data ref: Herv as-Mar ın et al, 2019b). Strikingly, this dataset was classified completely as EpSC-like (Figs 5D and EV3D), further suggesting a more invasive phenotype in EpSC-like cSCCs. To further validate these observations, we profiled the methylome of eight cSCC metastases and three metastasizing primary cSCC. These samples were obtained as formalin-fixed paraffin-embedded (FFPE) sections, and tumor tissue was isolated by laser microdissection to ensure high sample purity (Appendix Fig S6). Methylation analysis of differentiated keratinocyte accessible regions classified all 11 samples as EpSC-like (Figs 5E and EV3E), thus again suggesting the higher invasiveness and metastatic potential of this subclass.

Discussion
The roles of chromatin accessibility and DNA methylation in establishing cell identity throughout lineage differentiation are well accepted (Guo et al, 2016;Greenberg & Bourc'his, 2019;Li et al, 2021). Moreover, cancer methylomes reflect the epigenetic programs of the tumor-initiating cell, which can be used to define distinct cell-of-origin-based tumor subclasses, often with clinical implications (Kulis et al, 2013;Moran et al, 2016;Hoadley et al, 2018). Our results show that DNA methylation patterns found at differentially accessible regions between undifferentiated and differentiated keratinocytes define human KC subtypes through their cellsof-origin. These results confirm previous observations for AK and cSCC (Rodr ıguez-Paredes et al, 2018a) with an analytical framework that is completely based on single-cell data from human epidermis samples. It is important to notice that the set of differentially accessible peaks identified in our analysis shows little overlap with the set of enhancers defined in an in vitro system of epidermal stem cell differentiation ) (Appendix Fig S7), which further highlights the importance of our in vivo approach. Paired multimodal single-cell profiling provides new opportunities to study differentiation processes and for characterizing important cell states. Furthermore, combining various genomic read-outs is important for the identification of key transcription factors involved in differentiation trajectories. Our single-cell multi-omics analysis of the healthy human epidermis identified several wellknown transcription factors associated with undifferentiated keratinocytes (i.e., TP63, TEAD1, and IRF1) (Truong et al, 2006;Yuan et al, 2020) and with terminally differentiated keratinocytes (i.e., CEBPA, GRHL1, and SREBF2) (  . Of note, this approach also identified MEF2A as one of the key regulators for undifferentiated basal keratinocytes. MEF2A has been previously reported to exert important roles in the differentiation of several cell lineages such as the skeletal muscle and neuronal systems (Estrella et al, 2015;Zhu et al, 2018). However, it has not been linked to epidermal differentiation yet.
Single-cell DNA methylation approaches hold great potential to further explore the role of this epigenetic modification in the context of cellular differentiation. However, their development is still in early stages, with limited applicability (Karemaker & Vermeulen, 2018). Combinatorial indexing (Adey et al, 2014;Amini et al, 2014) could provide a solution to many current limitations due to its high scalability. However, the published protocol for single-cell combinatorial indexing to whole-genome bisulfite sequencing (Mulqueen et al, 2018) contains important inaccuracies and methodological gaps that have so far precluded a wider application of the method. After successfully addressing these shortcomings, we generated a sci-MET library containing 554 single cells with an average CpG coverage of 0.85%. Single-cell methylomes of human epidermal cells showed differences in average methylation content as well as a certain degree of dissimilarity, thus suggesting dynamic methylation changes upon differentiation in vivo. Single-cell DNA methylation studies are also critically important for identifying the exact cell type that provides the tumor cell-of-origin. Thus, we combined our extensive bulk AK/cSCC methylation dataset with the 554 single-cell methylomes, which identified six out of the 554 cells as EpSCs and clustered them with the EpSC-like tumors. Human EpSCs have been previously estimated to represent approximately 1% of the epidermal cells (Rachidi et al, 2007). Our results are consistent with these numbers and validate the cell-of-origin-based stratification of human AK/cSCC.
We also used scRNA-seq data from more than 32,000 epidermal cells to define the main keratinocyte populations along the epidermal differentiation trajectory. We identified a cell composition that was similar to previous reports (Ji et al, 2020;Cheng et al, 2018a). However, contrary to the multiple branches identified in another scRNA-seq analysis using a graph-based approach (Cheng et al, 2018a), our RNA velocity analysis suggests a single differentiation trajectory. Interestingly, our results suggest that only one of the two basal populations found in adult human epidermis (Basal 1) is located at the beginning of the differentiation trajectory, followed by the mitotic and more differentiated keratinocyte populations. KRT19-expressing Basal 2 cells were not part of the trajectory, which is consistent with previous reports (Michel et al, 1996;Pontiggia et al, 2009). Such keratinocytes are known to be self-renewing but not involved in terminal differentiation and might represent a stem cell reservoir for the interfollicular epidermis (Pontiggia et al, 2009). Deconvolution of bulk AK/cSCC methylation patterns using these scRNA-seq data detected distinct keratinocyte populations in both keratinocyte-like and EpSC-like DNA methylation subclasses. These results indicate that the methylation profiles defined by single-cell methylomics do not represent completely homogeneous keratinocyte populations. Furthermore, tumor heterogeneity is also consistent with published scRNA-seq results of human cSCC that revealed the presence of undifferentiated, mitotic, and well-differentiated keratinocyte populations, even when they share a common initiating cell (Ji et al, 2020). Importantly, our approach also indicated a strong enrichment for Basal 1 keratinocytes in EpSC-like tumors, while finding differentiated spinous and granular cells in the keratinocyte-like cases.
DNA methylation clocks are compound biomarkers that are increasingly used in cancer research (Yang et al, 2016;Duran-Ferrer et al, 2020;Teschendorff, 2020). When we used DNA methylation clocks to calculate the mitotic age and stem cell division rates (SCDR) in epidermal tumors, our results revealed a general increase in mitotic age and SCDR in AK/cSCC in comparison with the healthy epidermis, as described for several other malignancies (Yang et al, 2016;Teschendorff, 2020). While we observed a substantial increase in the keratinocyte-like subgroup, EpSC-like tumors showed a more moderate increase, which we interpreted to reflect a less proliferative but more invasive phenotype. This was confirmed by methylation analysis of miRNAs from the miR-200 family (i.e., miR-200-c and miR-141) and miR-205, which play an essential role in maintaining epithelial phenotypes by targeting the E-cadherin transcriptional repressors ZEB1 and ZEB2 (Gregory et al, 2008;Park et al, 2008). Silencing of these miRNAs by promoter hypermethylation has been described in several human cancer cell lines displaying EMT features (Neves et al, 2010;Davalos et al, 2012) and invasive epithelial human malignancies, such as muscle-invasive bladder cancer (MIBC) (Wiklund et al, 2011). Consistent with a more invasive phenotype, we observed promoter hypermethylation in MIR200C/141 and MIR205 genes and increased ZEB2-positive nuclei in EpSC-like tumors. Hence, our analysis proposes cell-oforigin-dependent differences in the invasive phenotype of AK/cSCC and thus establishes novel opportunities for the development of risk stratification biomarkers.
Following this line of thought, we expanded our DNA methylation-based stratification to other epidermal tumor entities with different metastatic potential, including in situ carcinoma (BD), rarely metastatic BCC, and non-cancerous senile and common warts. All entities could be stratified again into two subclasses displaying either EpSC-like or keratinocyte-like methylation profiles, indicating that the previously described bimodal cell-of-origin model can be applied to a wide range of epidermal tumors. Importantly, our analysis uncovered a prominent bias toward a more differentiated cell-oforigin for entities bearing a lower metastatic potential such as BCC or non-cancerous warts. In contrast, most EpSC-like tumors belonged to entities with a higher metastatic potential (cSCC), including precursor lesions (AK and BD) that can progress to metastatic cSCC if left untreated. The consistent classification of invasive cSCC samples, cSCC metastases and primary metastasizing cSCC samples entirely as EpSC-like tumors further supported this notion.
Altogether, our study thus provides novel insight into the role of chromatin accessibility and DNA methylation in epidermal differentiation and KC initiation and proposes a general stratification strategy for epidermal tumors that might improve patient risk assessment.

Samples
For scRNA-seq, single-cell multi-omics and sci-MET experiments, we obtained remnant clinically healthy whole skin from patients undergoing routine surgery at Heidelberg University Hospital. Samples were obtained from the sun-protected ilioinguinal region after written informed consent by the patients, in compliance with current legislation and as approved by the Ethics Committee of Heidelberg University (no. S-091/2011). All samples used for these experiments were obtained from male donors.
Punch biopsies (4-mm) from four AK, 17 cSCC, 11 BD, 14 BCC, and 10 SK samples (Table EV1) were obtained at the Department of Dermatology of the Heidelberg University Hospital, as approved by the ethics committee of Heidelberg University (protocol no. S-091/ 2011). Moreover, 12 healthy, 16 AK, and 18 cSCC samples previously analyzed and used for publication were also included in this study (Rodr ıguez-Paredes et al, 2018a; Data ref: Rodr ıguez-Paredes et al, 2018b). All samples were immediately immersed in liquid nitrogen after resection and stored at À80°C. Epidermal regions of the tumors were separated from the dermis by heat-split (incubated in pre-warmed PBS at 37°C for 1 min and then at 56°C for up to 5 min) and carefully dissected manually under a magnifying glass. Only tumor samples for which proper epidermis isolation was achieved were included in the study. DNA was isolated using the QIAamp DNA Investigator Kit (Qiagen) following the manufacturer's instructions.
Metastatic primary cSCC and cSCC metastasis samples (Table EV1) were obtained as 7-lm FFPE sections provided by the tissue bank of the National Center for Tumor Diseases (NCT Heidelberg, Germany) and the Department of Dermatology of Heidelberg University Hospital, in accordance with the regulations of the tissue bank and the approval of the ethics committee of Heidelberg University (protocol no. S-091/2011). Sections were placed on Mem-braneSlide NF 1.0 PEN (Zeiss) slides, and tumor tissue was isolated by laser microdissection using the Zeiss PALM MicroBeam system (Zeiss). DNA was subsequently isolated using the QIAamp DNA Micro Kit (Qiagen) following the manufacturer's instructions.
Diagnosis and histopathological features of both fresh-frozen (FF) and FFPE tumor samples obtained at Heidelberg University Hospital were routinely established by an expert dermatohistopathologist and reviewed before inclusion in this study.
Handling of samples and data was performed in a pseudonymized manner, also in strict compliance with the current legislation and institutional guidelines for data protection and privacy of the participating patients.

Single-cell multi-omics sequencing
Healthy whole skin biopsies were obtained from two male donors of fair-skin type (55 and 72 y/o) and preserved in MACS Tissue Storage Solution (Miltenyi Biotec). Samples were subsequently cut into small pieces that were further processed using the Epidermis Dissociation Kit, human (Miltenyi Biotec) and the Gentle MACS Dissociator (Miltenyi Biotec), following the manufacturer's instructions. Nuclei were isolated from the resulting single-cell suspension using the lysis buffer recipe described in (Wysocka et al, 2001) In brief, epidermal cells were resuspended in Buffer A (10 mM HEPES pH 7.9, 10 mM KCl, 1.5 mM MgCl 2 , 0.34 M sucrose, 10% glycerol, 1 mM DTT, and 1X protease inhibitor cocktail) containing freshly added Triton X-100 at a final concentration of 0.1%. Cells were resuspended in a 1 × 10 6 cells/25 ll of Buffer A ratio and were incubated for 10 min on ice. Nuclei were recovered by centrifugation at 1,300 g for 5 min at 4°C and resuspended in 1× Nuclei Buffer (10× Genomics).
scATAC-seq and scRNA-seq libraries were generated using the Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Reagent Kit (10× Genomics), as described by the manufacturer. Approximately, 10,000 nuclei per sample were loaded into a Chromium Single Cell Controller (10× Genomics) as initial input. Quantification of the library was carried out using the Qubit dsDNA HS Assay Kit (Life Technologies), and cDNA integrity was assessed using D1000 ScreenTapes (Agilent Technologies). Paired-end (28 + 90 bp) sequencing (100 cycles) was used for the scRNA-seq libraries while paired-end (50 + 50 bp) sequencing (100 cycles) was used for the scATAC-seq libraries, both performed with a NovaSeq 6000 device (Illumina).

Single-cell multi-omics sequencing data analysis
Raw sequencing data were processed with the Cell Ranger software (version 2.1.0) from 10× Genomics, and downstream analysis was performed using the Seurat (version 4.0.5) (Stuart et al, 2019) and Signac (version 1.5.0) (Stuart et al, 2021) packages. Low-quality cells were filtered out using Signac by removing those with less than 1,000 or more than 50,000 UMIs in the scRNA-seq and those with less than 1,000 or more than 100,000 counts in the scATAC-seq data. Furthermore, we also filtered out cells with a higher nucleosome signal than 2 and a transcriptional start site (TSS) enrichment lower than 1. The final datasets thus contained 2,851 and 2,714 single cells. We then used MACS2 to call the ATAC peaks on each sample independently. To ensure comparability between samples, we created a common set of peaks by merging all intersecting peaks using the GenomicRanges package (version 1.46.1) (Lawrence et al, 2013).
To avoid batch effects, we integrated the scRNA-seq datasets from the two samples using the standard protocol described in the Seurat package (Stuart et al, 2019) and as described above. We used default parameters and 30 CCA dimensions for the integration. Data dimensionality was reduced, and cell embeddings were calculated for both scRNA-seq and scATAC-seq data for the integrated dataset using PCA and latent semantic indexing (LSI), respectively. Then, the cell embeddings were integrated using the IntegrateEmbeddings() function from Signac using the anchors identified for the data integration of the scRNA-seq data. Lastly, we generated a joint dimensional reduction combining both scATAC-seq and scRNA-seq data using the weighted nearest neighbor method from Seurat and using 50 dimensions for each assay.
Unsupervised clustering of the integrated data was performed on the scRNA-seq data using 50 PCA dimensions and 0.4 resolution, which resulted in 10 cell clusters that were visualized by UMAP. Cell-type identity was established by transferring the cell labels from the reference scRNA-seq dataset containing more than 30,000 epidermal cells. To that end, we used 30 PCA dimensions for identifying the transfer anchors and for transferring the celltype labels to the multiome dataset. Differentially accessible regions between undifferentiated (cluster 2) and differentiated keratinocytes (cluster 3) were identified using the FindMarkers() function. To assess the histone modification landscape at differentially accessible regions, we made use of previously published ChIP-seq data of histone modifications generated on NHEK cells (ENCODE) and available at the UCSC genome annotation database. The analysis was performed using the ChipPeakAnno (v.3.28.1) package .
Cis-regulatory interactions at the epidermal differentiation complex (EDC) were predicted by identifying co-accessible peaks in undifferentiated (cluster 2) and differentiated keratinocytes (cluster 3) independently using Cicero (v.1.3.6) (Pliner et al, 2018). Only Cicero connections with a co-accessibility score higher than 0.25 were plotted.
Overrepresented TF-binding motifs in differentially accessible peaks were identified using the FindMotifs() function with default parameters. In addition, motif activity was also calculated in each individual cell using chromVAR (Schep et al, 2017). Then, cell-typespecific gene markers and active motifs were identified using a Wilcoxon rank-sum test and the area under the receiver operator curve (auROC) with the presto package (v.1.0.0). Cell-type-specific transcription factors were obtained by ranking transcription factors by the average AUC statistic from both gene expression and motif activity for each cluster.
Regulatory gene network inference was performed using pySCENIC (v0.11.2) (Kumar et al, 2021) with default parameters in Phython (v3.7), following the previously described protocol (Aibar et al, 2017;Kumar et al, 2021). In brief, potential regulatory interactions were inferred based on the expression of predefined human transcription factors and their target genes in the preprocessed gene expression data from keratinocyte clusters defined by the single-cell multi-omics approach, using the GRNBoost2 algorithm. These interactions were then used to calculate TF-gene co-expression modules, which were subsequently subjected to motif enrichment analyses. Thus, only target genes that contained the corresponding TFbinding site were kept in the module. The activity of the resulting regulons was then assessed in individual cells using the AUCell score method. Lastly, to identify cell-type-specific regulons, we calculated a Z-score as previously described, and the top five regulons per cell type were displayed as a heatmap.

Single-cell RNA sequencing
A healthy whole skin biopsy was obtained from a 30 y/o male donor of fair-skin type and preserved in MACS Tissue Storage Solution (Miltenyi Biotec). The skin sample was subsequently cut into small pieces that were further processed using the Epidermis Dissociation Kit, human (Miltenyi Biotec) and the Gentle MACS Dissociator (Miltenyi Biotec), following the manufacturer's instructions. The resulting cell suspension was then filtered through a 70-lm cell strainer (Falcon) and depleted of apoptotic and dead cells with the Dead Cell Removal Kit (Miltenyi Biotec).
We used the 10× Genomics platform to generate a sequencing library with the Chromium Single Cell 3' Reagent Kit, v2 chemistry (10× Genomics), as described by the manufacturer. Approximately, 20,000 cells were loaded into a Chromium Single Cell Controller (10× Genomics) as initial input. Quantification of the library was carried out using the Qubit dsDNA HS Assay Kit (Life Technologies), and cDNA integrity was assessed using D1000 ScreenTapes (Agilent Technologies). Paired-end (26 + 74 bp) sequencing (100 cycles) was finally performed with a HiSeq 4000 device (Illumina).

Single-cell RNA sequencing data analysis
Raw sequencing data were processed with the Cell Ranger software (version 2.1.0) from 10× Genomics, and downstream analysis was performed using the Seurat package (version 3.1.1) (Stuart et al, 2019). A total of 7,752 cells passed the quality control of Cell Ranger. Further filtering of low-quality cells was carried out using Seurat by removing those expressing < 200 genes or more than 2,500, as well as cells expressing more than 5% of mitochondrial genes. The final dataset thus contained 7,143 single-cell transcriptomes. Unsupervised cell clustering was performed using 20 PCA dimensions and 0.5 resolution, which resulted in 10 clusters, and visualized as uniform manifold approximation and projection (UMAP) plots. Each cluster's representative gene markers were identified using the FindAllMarkers() function.
To combine our scRNA-seq sample with the three previously published abdominal epidermis samples (Cheng et al, 2018a;Data ref: Cheng et al, 2018b), we performed sample integration following Seurat's standard protocol (Stuart et al, 2019). First, gene expression in each cell was normalized using a log-normalization of the Unique Molecular Identifier (UMI) counts for each sample independently. Also, the 2,000 most variable genes per sample were identified. These features were then used to find common anchors using FindIntegrationAnchors(), with default parameters and 30 canonical correlation analysis (CCA) dimensions. Final integration was subsequently achieved using these anchors in IntegrateData(), with 30 CCA dimensions and default parameters.
Unsupervised clustering of the integrated data was performed using 30 PCA dimensions and 0.4 resolution, which resulted in 13 cell clusters that were visualized by UMAP projection. Cell-type identity was established comparing the most representative genes found by FindAllMarkers() and literature-based gene markers.

RNA velocity analysis
Spliced and unspliced reads from the in-house scRNA-seq dataset were obtained by running the command line interface of velocyto (version 0.17) (la Manno et al, 2018). For this analysis, we only used the in-house dataset, as the not preprocessed raw data were not available for the rest of the samples. Data were then preprocessed by normalization, log-transformation, and identification of highly variable genes, before calculating the RNA velocity. Additionally, melanocytes and immune cells were removed from the dataset, resulting in 7,068 keratinocytes that were used for further analysis. RNA velocity was then estimated by the generalized dynamical model of scVelo (version 0.2.3) , using the recover_dynamics() function with 100 maximum iterations. Lastly, RNA velocities were projected and visualized onto the UMAP embedding calculated by Seurat. The latent time was calculated using the default values and manually assigning the root cells to cluster Basal 1, based on the RNA velocity results.

DNA methylation analysis
DNA methylation data from epidermal tumors were obtained using Infinium MethylationEPIC BeadChips (Illumina), according to the manufacturer's protocols, at the Genomics and Proteomics Core Facility of the German Cancer Research Center (DKFZ, Heidelberg, Germany). For FFPE samples, a special restoration protocol was applied during sample preparation to ensure good sequencing quality, following the manufacturer's instructions.
Data processing and analysis were performed using the R Bioconductor package minfi (version 1.34.0) (Aryee et al, 2014), as previously described (Rodr ıguez-Paredes et al, 2018a). Briefly, raw sequencing data were preprocessed by filtering out probes located in sex chromosomes as well as low-detected, self-hybridizing, and SNPassociated CpGs. Data normalization was performed using the function preprocessFunnorm(). Methylation levels for each individual CpG were displayed as b values, which are calculated using the ratio of methylated and unmethylated intensities per locus (b = methylated/ (methylated + unmethylated + 100)). Detection of differentially methylated probes (DMPs) was carried out by fitting a linear model and using an empirical Bayes method for statistical testing. Multiple testing was corrected using the Benjamini-Hochberg method, and DMPs were filtered by significance threshold (P-value < 0.05, F-test).
Gene Ontology analysis using differentially methylated probes was performed using the R package methylGSA (version 1.6.1) (Ren & Kuan, 2019). As promoters, we used CpG probes corresponding to the TSS1500, TSS200, 1stExon, and 5 0 UTR locations obtained from the UCSC reference group of the Illumina annotation, contained in the IlluminaHumanMethylationEPICanno.ilm10b4.hg19 R package (Hansen, 2017).
Mitotic age and stem cell division rates (SCDR) were calculated using the publicly available epiTOC2 R Script with default parameters (Teschendorff, 2020).

DNA methylation-based stratification of KC samples
CpG probes located at the set of differentially accessible regions between undifferentiated and differentiated keratinocytes identified by scATAC-seq and contained in the Infinium MethylationEPIC were identified (n = 4,351). The number of CpG probes detected in peaks from undifferentiated keratinocytes was significantly higher (2,925 probes) than in peaks from differentiated keratinocytes (1,426 probes). This was most likely due to the fact that peaks in undifferentiated keratinocytes cover higher DNA sequences in general than those in differentiated keratinocytes. Furthermore, peaks in differentiated keratinocytes are enriched for enhancers, usually associated with intergenic regions, which are underrepresented in EPIC arrays.
Then, methylation values for CpG probes located in the same region were averaged for each sample. Tumor stratification was performed by hierarchical clustering using complete-linkage and Euclidean distances. Clustering was then visualized as heatmaps or dendrograms.
We used healthy epidermis methylation as a control as it mostly represents the methylome of terminally differentiated keratinocytes. Thus, we combined each previously published dataset with the 12 healthy epidermis samples from our cohort. Raw methylation data were not available for the dataset published in AL Eitan et al., so we used processed mean b values per group as provided in the original publication (Al-Eitan et al, 2020). Stratification of these datasets was performed as described above.

Tn5 activity assessment and transposome assembly
For the sci-MET experiment, a homemade Tn5 transposase provided by the Protein Expression and Purification Core Facility at the European Molecular Biology Laboratory (EMBL; Heidelberg, Germany) was used, and its activity was assessed as follows. First, forward linker oligonucleotides (FC121-1030 and FC121-1031) and the reverse linker Tn5MERev, obtained from (Picelli et al, 2014), were resuspended at 100 lM in EB Buffer (Qiagen). Each forward linker was then combined with the reverse oligo at a 1:1 ratio and subsequently annealed in a thermocycler (95°C for 5 min, cooldown to 65°C, 65°C for 5 min, cool-down to 4°C). Cool-down steps were performed at a À0.1°C/s rate. Next, transposomes were assembled by adding 0.5 ll of each annealed linker to 10 ll of the Tn5 stock, followed by an incubation at 23°C for 1 h in a thermocycler.
After assembly, transposomes were diluted in 50% glycerol at different concentrations to assess the proper working concentration. Tagmentation was performed in a 5 ll reaction containing 2.5 ll of 2× Tagmentation Buffer (20 mM Tris-HCl pH 7.5, 20 mM MgCl 2, 50% DMF), 1.25 ll of the diluted Tn5, and 150 pg of cDNA as input. Samples were incubated at 55°C for 3 min and subsequently cooled down to 10°C. Then, reactions were stopped by adding 1.25 ll of 0.2% SDS followed by a 5 min incubation at room temperature. Tagmented cDNA amplification was directly carried out by adding 10 ll of the following PCR mixture: 6.75 ll of KAPA HiFi HotStart ReadyMix (Roche), 0.75 ll of DMSO, 1.25 ll of 10 lM Illumina i7 adapter (5 0 -CAAGCAGAAGACGGCATACGAGATGTCTCGT GGGCTCGG), and 1.25 ll of 10 lM Illumina i5 adapter (5 0 -AA TGATACGGCGACCACCGAGATCTACACTCGTCGGCAGCGTC). The amplification reaction was performed by an incubation for 3 min at 72°C and 30 s at 95°C, followed by 12 cycles of 20 s at 98°C, 15 s at 58°C and 30 s at 72°C, ending with a final incubation of 30 s at 72°C.
Amplified DNA was then cleaned-up using AMPureXP Beads (Beckman Coulter) at 1× volume. Samples were incubated for 5 min at room temperature and then placed in a magnetic rack where beads were washed in 80% EtOH. Lastly, DNA was eluted in 10 ll of H 2 O. The Tn5 activity was assessed by checking DNA quantification and fragment size distribution using the Qubit dsDNA HS Assay Kit (Life Technologies) and D5000 ScreenTapes (Agilent Technologies), respectively. The 1:50 dilution was found to provide the highest yield with an adequate fragment size distribution for subsequent sequencing, so it was finally used for the sci-MET experiment.
To assemble the sci-MET transposomes, the 96 unique Cytosinedepleted linkers and the reverse complement primer described in Mulqueen et al (2018) were resuspended at 100 lM in EB buffer (Qiagen) and combined at a 1:1 ratio. Linkers were then annealed as described above and diluted at 1:50 in EB Buffer (Qiagen). The Tn5 stock transposase was also diluted at 1:50 in 50% glycerol. Then, transposomes were assembled by adding one volume of the diluted linkers to 10 volumes of the diluted transposase (i.e., 1 ll of diluted linker to 10 ll of diluted Tn5), followed by an incubation for 1 h at 23°C and room temperature. Assembled transposomes were stored at À20°C.

Single-cell combinatorial indexing for methylation analysis (sci-MET)
A healthy whole human skin biopsy was obtained from the ilioinguinal region of a 62 y/o male donor of fair-skin type. The sample Ó 2022 The Authors Molecular Systems Biology 18: e11073 | 2022 was immersed in MACS Tissue Storage solution (Miltenyi Biotec) immediately after resection and was kept on ice until further processing. The single-cell suspension from the epidermis was obtained using the Epidermis Dissociation Kit, human (Miltenyi Biotec), following the manufacturer's instructions. We then followed the previously published sci-MET protocol (Mulqueen et al, 2018), with important modifications. Hence, cells were then fixed by incubation with 1.5% formaldehyde (without methanol) in 1 ml of PBS for 10 min, with gentle shaking. The reaction was stopped by adding 80 ll of 2.5 M glycine followed by 5 min incubation on ice. The sample was centrifuged at 550 g for 10 min at 4°C to recover fixed cells.
Nuclear isolation was performed using the lysis buffer recipe described in (Wysocka et al, 2001) and as described for the single-cell multi-omics dataset. Recovered nuclei were subsequently subjected to nucleosome depletion. To do so, fixed nuclei were resuspended in 800 ll of 1× NEBuffer 2.1 (New England Biolabs) supplemented with 0.3% SDS, and incubated for 30 min at 42°C with intense shaking. Then, 200 ll of 10% Triton X-100 was added to the sample, which was incubated for 30 min at 42°C with intense shaking. Nuclei were then centrifuged at 500 g for 5 min at 4°C and filtered through a 40-lm cell strainer. The nuclear stain TO-PRO TM -3 Iodide (1:10,000; Invitrogen) was then added, and we subsequently proceeded to fluorescence-activated nuclei sorting (FANS).
In this second sorting, we sorted 22 nuclei/well in a full 96well plate containing 5 ll of Zymo Digestion Reagent (2.5 ll of Zymo M-Digestion Buffer, 2.25 ll of nuclease-free H 2 O, and 0.25 ll of Proteinase K), following the manufacturer's instructions of the EZ-96 DNA Methylation-Direct MagPrep Kit (Zymo Research). Only 10 nuclei were sorted in three wells, where 35 pg of a barcoded-lambda DNA were added as spike-in controls. The plate was incubated for 4 h at 50°C for nuclear digestion. Bisulfite conversion was then performed by adding 32.5 ll of Zymo CT Conversion Reagent in each well and incubating the plate according to the manufacturer's instructions (98°C for 8 min and 64°C for 3.5 h).
After conversion, bisulfite-converted DNA was desulphonated, washed, and recovered following the manufacturer's instructions. In brief, 5 ll of Zymo M-Binding Beads diluted in 150 ll of Zymo M-Binding Buffer was added to each well, and the plate was incubated for 5 min at room temperature for allowing the DNA to attach to the beads. Beads were then washed with 80% ethanol, resuspended in 50 ll of Zymo M-Desulphonation Buffer, and incubated at room temperature for 15 min. Beads were then washed again in 80% ethanol, and DNA was subsequently eluted in 25 ll of Zymo M-Elution Buffer. Eluted DNA was transferred to a 96-well plate containing 16 ll of nuclease-free H 2 O, 5 ll of 10× NEBuffer 2.1 (New England Biolabs), 2 ll of 10 mM dNTP mix, and 2 ll of 9-nucleotide random primers (Mulqueen et al, 2018). DNA was subjected to four rounds of linear amplification in a thermocycler by incubating at 95°C for 45 s to achieve denaturation. The plate was quickly placed on ice, and 10 U of Klenow (3 0 -5 0 exo-) polymerase (Biozym Scientific) was added per well. The plate was placed again in the thermocycler for incubation at 4°C for 5 min, followed by incubation at 37°C for 90 min. Importantly, the temperature was increased to 37°C by 1°C/15 s. In each amplification round, fresh reagents were added to each well (1.25 ll of 4× NEBuffer 2.1(New England Biolabs), 1 ll of 10 lM dNTP mix, and 1 ll of 9-nucleotide random primers (Mulqueen et al, 2018)) as well as 10 U of Klenow (3'-5' exo-) polymerase (Biozym Scientific).
Amplified material was subsequently purified using AMPureXP beads (Beckman Coulter) at a 1.1× volume. DNA was attached to the beads, washed with 80% ethanol, and eluted in 21 ll of EB Buffer (Qiagen). Eluted DNA was added to a 96-well plate containing 25 ll of 2× KAPA HiFi HotStart ReadyMix (Roche), 2 ll of 10 lM i7 index PCR primer, and 2 ll of i5 index PCR primer. Index PCR reaction was performed in a thermocycler by incubating at 95°C for 2 min, then performing 18 cycles at 94°C for 80 s and, finally, incubating at 65°C for 30 s and at 72°C for 30 s. The number of cycles was determined by a qPCR reaction performed with additional wells and could be up to 21 cycles. Libraries were then pooled to achieve a particular cell number and purified using a double-sided size selection step with AMPureXP Beads (0.6×-0.8×; Beckman Coulter). The library was subsequently quantified with Qubit dsDNA HS Assay Kit (Invitrogen), and fragment size was assessed using D5000 or D1000 ScreenTapes (Agilent Technologies). Paired-end (150 bp) sequencing (100 cycles) was finally performed with a NextSeq550 High-Output (Illumina) system following a custom-made recipe (Read 1: 100 imaged cycles; Index Read 1: 10 imaged cycles; Index Read 2: 11 imaged cycles, 16 dark cycles and 10 imaged cycles). Primer sequences for library preparation and sequencing are shown in Appendix Table S1.

sci-MET data analysis
All single cells displaying a minimum of 100,000 sequencing reads were kept for further analysis, which resulted in 554 cells. Reads corresponding to these cells were trimmed by removing stretches of bases with a quality score of < 30 at the end of the reads. Trimmed reads were mapped using bsmap (Xi & Li, 2009). As a reference sequence for the bisulfite mapping, we used the hg19 assembly of the human genome. In order to achieve comparability with the binary nature of single-cell methylation values, the b values of bulk methylomes were binarized by setting every value ≥ 0.5 to 1 and every value < 0.5 to 0. Methylation data of all 554 single cells were then combined with the binarized data of the 67 bulk DNA methylomes from AK, cSCC and healthy epidermal samples, and a 2dimensional multidimensional scaling (MDS) analysis was performed. For the ChromHMM analysis, we used the Chromatin State Segmentation by HMM from ENCODE/Broad for epidermal keratinocytes (NHEK) (Ernst et al, 2011), provided by the UCSC genome server (http://genome.ucsc.edu). Methylation values of all CpGs located within each of the 15 types of genomic segments were averaged for each sci-MET cell-type and EPIC tumor subclass.

Bulk DNA methylation data deconvolution
Cell fraction estimation in bulk epidermal tumor methylation data was performed using the EpiSCORE R package (version 0.9.2) . We used 15,000 keratinocytes from the integrated scRNA-seq dataset as the input to generate a gene expression reference matrix with a maximum marker specificity score (MSS) of 3. Only keratinocyte populations involved in the differentiation trajectory defined by RNA velocity and also present in the single-cell multi-omics dataset were included in this analysis. The gene expression reference matrix comprised 784 marker genes and was subsequently validated on a downsampled dataset containing 10,000 keratinocytes, showing an overall cell prediction accuracy of 74%. EpiSCORE then compared the expression-based reference gene matrix to two available datasets containing paired gene expression and DNA methylation data (Epigenomics Roadmap and SCM2) to build an imputed reference DNA methylation matrix for human epidermis, which contained 74 genes. This imputed matrix was then used to estimate cell fractions in our bulk epidermal tumor methylation datasets.

Immunofluorescence stainings
Immunofluorescence stainings were performed with 4 lm sections obtained from nine cSCC and one in situ cSCC (BD) previously stratified based on their cells-of-origin, corresponding to five keratinocyte-like and five EpSC-like tumors. Briefly, sections were deparaffinized in xylene and rehydrated in a gradient of ethanol and distilled water prior to heat-induced antigen retrieval. To this end, slides were incubated for 30 min at 95°C in a water bath in 10 mM citrate buffer (pH 6.0) containing 0.05% Tween-20. Subsequently, non-specific antibody binding was blocked by incubation with 10% normal goat serum for 1 h, followed by overnight incubation with primary antibodies diluted in blocking solution at 4°C. Primary antibodies used were rabbit anti-ZEB2 (Sigma-Aldrich, HPA003456, 1:100) and mouse anti-TP63 (Abcam, ab735, 1:100). After washing with PBS with 0.1% Tween-20, sections were then incubated with corresponding Alexa Fluor-conjugated secondary antibodies (Life Technologies) for 2 h at room temperature. Nuclear counterstaining was performed with DAPI, and slides were mounted using ProLong Gold Antifade Mountant (ThermoFisher). Images were taken with an Olympus VS200 slide scanner system (Olympus) using a 40× oil immersion lens and were further processed using the Fiji software (Schindelin et al, 2012).
The percentage of ZEB2-positive tumor nuclei (TP63-positive) was calculated in three different regions per sample, counting at least 500 tumor cells per region. Statistical analysis was performed using an unpaired two-sided t-test using each region as an independent value.

Data availability
The datasets generated in this study are available from the following databases:  A Uniform manifold approximation and projection (UMAP) plot depicting single-cell transcriptomes from healthy sun-protected human epidermis (n = 4). Each dot represents a single cell (n = 32,272). Colors depict the six archetypical keratinocyte populations described in the text, as well as other minority cell types (Cheng et al, 2018a;Data ref: Cheng et al, 2018b). B Average expression of the top five gene markers defining each cell population projected on the UMAP plot. Red indicates maximum average gene expression, while blue indicates low or no expression of a particular set of genes in log-normalized UMI counts. C Left: RNA velocities calculated using the 7,068 keratinocytes from the in-house generated dataset of healthy human epidermis, projected onto the UMAP embedding.
Right: UMAP plot displaying the latent time calculated by scVelo.
▸ Figure EV3. Cell-of-origin-based tumor stratification strategy can be expanded to other epidermal tumor entities.
A-E Heatmaps displaying unsupervised hierarchical clustering based on the methylation patterns at differentiated keratinocyte-specific peaks of ( A Scatter plots of pairwise comparisons between AK, BD, and cSCC methylomes from the keratinocyte-like (upper) or EpSC-like (lower) subclass. B Scatter plot comparing AK, BD, and cSCC methylomes from each cell-of-origin subclass as a unique entity. C Top five enriched Gene Ontology (GO) terms using genes with differentially methylated promoter regions between AK/BD/cSCC samples from distinct cell-of-origin subclasses.
Data information: In the scatter plots, significantly differentially methylated CpG probes (P-value < 0.05, F-test) are depicted in blue. AK: actinic keratosis, BD: Bowen's disease, cSCC: cutaneous squamous cell carcinoma, DMP: differentially methylated probes. A Uniform manifold approximation and projection (UMAP) plot depicting single-cell transcriptomes from healthy sun-protected human epidermis (n = 4). Each dot represents a single cell (n = 32,272). Colors depict the six archetypical keratinocyte populations described in the text, as well as other minority cell types (Cheng et al, 2018a;Data ref: Cheng et al, 2018b). B Average expression of the top five gene markers defining each cell population projected on the UMAP plot. Red indicates maximum average gene expression, while blue indicates low or no expression of a particular set of genes in log-normalized UMI counts. C Left: RNA velocities calculated using the 7,068 keratinocytes from the in-house generated dataset of healthy human epidermis, projected onto the UMAP embedding.

EV4
Right: UMAP plot displaying the latent time calculated by scVelo.
▸ Figure EV3. Cell-of-origin-based tumor stratification strategy can be expanded to other epidermal tumor entities.
A-E Heatmaps displaying unsupervised hierarchical clustering based on the methylation patterns at differentiated keratinocyte-specific peaks of ( A Scatter plots of pairwise comparisons between AK, BD, and cSCC methylomes from the keratinocyte-like (upper) or EpSC-like (lower) subclass. B Scatter plot comparing AK, BD, and cSCC methylomes from each cell-of-origin subclass as a unique entity. C Top five enriched Gene Ontology (GO) terms using genes with differentially methylated promoter regions between AK/BD/cSCC samples from distinct cell-of-origin subclasses.

Table of Contents
Appendix Figure S1.